Machine Learning Architecture for Imaging Protocol Detector

ABSTRACT

A system includes one or more processors coupled to non-transitory memory, and the one or more processors are configured to receive a first image representing at least a portion of a mouth of a user, execute a first machine-learning architecture trained to generate a set of features from the first image, determine, based on the set of features, that the first image satisfies at least one criteria for executing a second machine-learning architecture based on the first image, and generate, based on the first image satisfying the at least one criteria, a prompt indicating feedback for capturing a second image representing at least a second portion of the mouth of the user.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. Pat. Application No.17/858,734 filed Jul. 6, 2022, which is a continuation of U.S. Pat.Application No. 17/401,053 filed Aug. 12, 2021, now U.S. Pat. No.11,423,697, each of which are incorporated herein by reference in theirentirety and for all purposes.

TECHNICAL FIELD

The present disclosure relates generally to a machine learningarchitecture for intelligently processing two-dimensional imagescaptured of a user’s mouth, and interactively communicating with theuser in order to receive improved two-dimensional images of the user’smouth.

BACKGROUND

High quality images of a user’s mouth (e.g., mouth data, includingdental and intra-oral data) can be captured using hardware and softwareto reveal, highlight, accentuate, or distinguish relevant portions ofthe user’s mouth by widening the opening defining the user’s mouth or bykeeping the user’s mouth sufficiently open for capturing images.However, not all users have access to such hardware, and further suchhardware does not ensure that images of sufficient quality (e.g., highquality images) are ultimately captured. Accordingly, it can bedifficult for users to capture high quality images of a user’s mouth.Alternatively, trained professionals can advise and assist a user bypositioning hardware or the user’s face, or by operating an imagingdevice. However, visiting a trained professional is often not convenientfor users, not preferred by users, and can be expensive.

SUMMARY

An embodiment relates to a system. The system includes a capture deviceconfigured to capture a first image representing at least a portion of amouth of a user. The system also includes a communication deviceconfigured to communicate user feedback to the user. The system alsoincludes a processor and a non-transitory computer-readable mediumcontaining instructions that when executed by the processor causes theprocessor to perform operations. Operations performed by the processorinclude receiving the first image representing at least the portion ofthe mouth of the user. Additional operations performed by the processorinclude outputting user feedback for capturing a second imagerepresenting at least a portion of the mouth of the user, where the userfeedback is output in response to using a machine learning architectureto determine that an image quality score of the first image does notsatisfy an image quality threshold.

Another embodiment relates to a method. The method includes receiving,by an imaging protocol algorithm executing on one or more processors, afirst image representing at least a portion of a mouth of a user. Themethod also includes outputting, by the machine learning architectureexecuting on the one or more processors, user feedback for capturing asecond image representing a portion of the mouth of the user, where themachine learning architecture outputs the user feedback in response toan image quality score of the first image not satisfying an imagequality threshold.

Another embodiment relates to a system. The system includes acommunication device configured to capture a first image representing atleast a portion of a mouth of a user and communicate the first image toa server. The system also includes a processor of the server and anon-transitory computer-readable medium containing instructions thatwhen executed by the processor causes the processor to performoperations. Operations performed by the processor include receiving thefirst image representing at least the portion of the mouth of the user.Additional operations performed by the processor include communicating,to the communication device, user feedback for capturing a second imagerepresenting at least a portion of the mouth of the user, where the userfeedback is determined in response to determining via an imagingprotocol algorithm that an image quality score of the first image doesnot satisfy an image quality threshold.

This summary is illustrative only and is not intended to be in any waylimiting. Other aspects, inventive features, and advantages of thedevices or processes described herein will become apparent in thedetailed description set forth herein, taken in conjunction with theaccompanying figures, wherein like reference numerals refer to likeelements.

BRIEF DESCRIPTOIN OF THE DRAWINGS

Various example embodiments of the present solution are described indetail below with reference to the following figures or drawings. Thedrawings are provided for purposes of illustration only and merelydepict example arrangements of the present solution to facilitate thereader’s understanding of the present solution. Therefore, the drawingsshould not be considered limiting of the breadth, scope, orapplicability of the present solution. It should be noted that forclarity and ease of illustration, these drawings are not necessarilydrawn to scale.

FIG. 1 is a block diagram of a computer-implemented system including animage capture application utilizing a machine learning architecture,according to an illustrative embodiment.

FIG. 2 is a series of images with each image of the series includingvarying characteristics of an image, according to an illustrativeembodiment.

FIG. 3 is an agent-based feedback selection model, according to anillustrative embodiment.

FIG. 4 is an example of types of user feedback and a corresponding userscript for each type of user feedback, according to an illustrativeembodiment.

FIG. 5 is an interactive communication flow utilizing the image captureapplication, according to an illustrative embodiment.

FIG. 6 is series of images and corresponding landmarked models,according to an illustrative embodiment.

FIG. 7 is a landmarked model of a user, according to an illustrativeembodiment.

FIG. 8 is a block diagram of a simplified neural network model,according to an illustrative example.

FIG. 9 is a block diagram of an example system using supervisedlearning, according to an illustrative embodiment.

FIG. 10 is an illustration of interactive communication resulting fromthe implementation of the machine learning architecture of FIG. 5 ,according to an illustrative embodiment.

FIG. 11 is another illustration of interactive communication resultingfrom the implementation of the machine learning architecture of FIG. 5 ,according to an illustrative embodiment.

FIG. 12 is another illustration of interactive communication resultingfrom the implementation of the machine learning architecture of FIG. 5 ,according to an illustrative embodiment.

FIG. 13 is an example operational flow employing the machine learningmodels in series, according to an illustrative embodiment.

FIG. 14 is an illustration of a process for transmitting one or moreportions of high quality images for further processing and discardingone or more portions of low quality images, resulting from theimplementation of the machine learning architecture of FIG. 5 ,according to an illustrative embodiment.

DETAILED DESCRIPTION

Hereinafter, example arrangements will be described in more detail withreference to the accompanying drawings, in which like reference numbersrefer to like elements throughout. The present disclosure, however, canbe embodied in various different forms, and should not be construed asbeing limited to only the illustrated arrangements herein. Rather, thesearrangements are provided as examples so that this disclosure will bethorough and complete, and will fully convey the aspects and features ofthe present disclosure to those skilled in the art. Accordingly,processes, elements, and techniques that are not necessary to thosehaving ordinary skill in the art for a complete understanding of theaspects and features of the present disclosure may not be described.Unless otherwise noted, like reference numerals denote like elementsthroughout the attached drawings and the written description.

The systems and method described herein may have many benefits overexisting computing systems. For example, a machine learning architectureimproves a user experience associated with capturing high quality imagesof the user’s mouth by reducing costs and time associated with a uservisiting trained professionals by communicating relevant feedback to theuser in a user-friendly way. The interactive user-specific feedbackprovided to the user improves the quality of images captured by the userwhile decreasing the time and effort that the user spends beforecapturing the high quality image. For instance, the characteristics ofthe image (e.g., contrast, sharpness, brightness, blur) and the contentof the image (visibility of teeth, mouth angle, tongue position) areevaluated by the machine learning architecture to determine whether thecaptured image is a high quality image. The embodiments also improve theuser experience by communicating user-specific feedback. That is, thefeedback incorporates available user hardware, it is directed tofacilitating a particular user in capturing a high quality image inresponse to a received image, and it is heterogeneously communicated tothe user according to user preferences. Communicating user-specificfeedback reduces computational resources consumed by a system that wouldotherwise communicate general feedback by limiting the number ofiterations necessary to capture a high quality image of the user’smouth. For example, computational resources are conserved by systems bynot continuously communicating general and/or standard feedback to theuser in an attempt to guide the user to capture a high quality image.

Referring now to FIG. 1 , a block diagram of a computer-implementedsystem 100 including an image capture application utilizing a machinelearning architecture is shown, according to an embodiment. The system100 includes user device 121 and server 110. Devices and components inFIG. 1 can be added, deleted, integrated, separated, and/or rearrangedin various embodiments of the disclosed inventions. For example, somecomponents of FIG. 1 are illustrated as being executed on the userdevice 121. For example, latency may be reduced by providing userfeedback to the user using the user device 121. However, in someimplementations, the user device 121 may be used to capture an image,and the image may be transmitted to the server 110 for processing andfor providing a user feedback recommendation. That is, the circuits ofthe user device 121 may be performed on the server 110. Components ofthe user device 121 and/or server 110 may be locally installed (on theuser device 121 and/or server 110), and/or may be remotely accessible(e.g., via a browser based interface or a cloud system).

The various systems and devices may be communicatively and operativelycoupled through a network 101. Network 101 may permit the direct orindirect exchange of data, values, instructions, messages, and the like(represented by the arrows in FIG. 1 ). The network 101 may include oneor more of the Internet, cellular network, Wi-Fi, Wi-max, a proprietarynetwork, or any other type of wired or wireless network of a combinationof wired or wireless networks.

The user 120 may be any person using the user device 121. Such a user120 may be a potential customer, a customer, client, patient, or accountholder of an account stored in server 110 or may be a guest user with noexisting account. The user device 121 includes any type of electronicdevice that a user 120 can access to communicate with the server 110.For example, the user device 121 may include watches (e.g., a smartwatch), and computing devices (e.g., laptops, desktops, personal digitalassistants (PDAs), mobile devices (e.g., smart phones)).

The server 110 may be associated with or operated by a dentalinstitution (e.g., a dentist or an orthodontist, a clinic, a dentalhardware manufacturer). The server 110 may maintain accounts held by theuser 120, such as personal information accounts (patient history,patient issues, patient preferences, patient characteristics). Theserver 110 may include server computing systems, for example, comprisingone or more networked computer servers having a processor andnon-transitory machine readable media.

As shown, both the user device 121 and the server 110 may include anetwork interface (e.g., network interface 124A at the user device 121and network interface 124B at the server 110, hereinafter referred to as“network interface 124”), a processing circuit (e.g., processing circuit122A at the user device 121 and processing circuit 122B at the server110, hereinafter referred to as “processing circuit 122”), aninput/output circuit (e.g., input/output circuit 128A at the user device121 and input/output circuit 128B at the server 110, hereinafterreferred to as “input/output circuit 128”), an application programminginterface (API) gateway (e.g., API gateway 123A at the user device 121and API gateway 123B at the server 110, hereinafter referred to as “APIgateway 123”), and an authentication circuit (e.g., authenticationcircuit 117A at the user device 121 and authentication circuit 117B atthe server 110, hereinafter referred to as “authentication circuit117”). The processing circuit 122 may include a memory (e.g., memory119A at the user device 121 and memory 119B at the server 110,hereinafter referred to as “memory 119”), a processor (e.g., processor129A at the user device 121 and processor 129B at the server 110,hereinafter referred to as “processor 129”), an image captureapplication (e.g., image capture application 125A at the user device 121and image capture application 125B at the server 110, hereinafterreferred to as “image capture application 125”), and a natural languageprocessing (NLP) circuit (e.g., NLP circuit 108A at the user device 121and NLP circuit 108B at the server 110, hereinafter referred to as “NLPcircuit 108”).

The network interface circuit 124 may be adapted for and configured toestablish a communication session via the network 101 between the userdevice 121 and the server 110. The network interface circuit 124includes programming and/or hardware-based components that connect theuser device 121 and/or server 110 to the network 101. For example, thenetwork interface circuit 124 may include any combination of a wirelessnetwork transceiver (e.g., a cellular modem, a Bluetooth transceiver, aWi-Fi transceiver) and/or a wired network transceiver (e.g., an Ethernettransceiver). In some arrangements, the network interface circuit 124includes the hardware and machine-readable media structured to supportcommunication over multiple channels of data communication (e.g.,wireless, Bluetooth, near-field communication, etc.).

Further, in some arrangements, the network interface circuit 124includes cryptography module(s) to establish a secure communicationsession (e.g., using the IPSec protocol or similar) in which datacommunicated over the session is encrypted and securely transmitted. Inthis regard, personal data (or other types of data) may be encrypted andtransmitted to prevent or substantially prevent the threat of hacking orunwanted sharing of information.

To support the features of the user device 121 and/or server 110, thenetwork interface circuit 124 provides a relatively high-speed link tothe network 101, which may be any combination of a local area network(LAN), the Internet, or any other suitable communications network,directly or through another interface.

The input/output circuit 128A at the user device 121 may be configuredto receive communication from a user 120 and provide outputs to the user120. Similarly, the input/output circuit 128B at the server 110 may beconfigured to receive communication from an administrator (or other usersuch as a medical professional, such as a dentist, orthodontist, dentaltechnician, or administrator) and provide output to the user. Forexample, the input/output circuit 128 may capture user responses basedon a selection from a predetermined list of user inputs (e.g., drop downmenu, slider, buttons), an interaction with a microphone on the userdevice 121, or an interaction with a graphical user interface (GUI)displayed on the user device 121 (e.g., as described in FIGS. 10-12 ),an interaction with a light sensor, an interaction with anaccelerometer, and/or an interaction with a camera. For example, a user120 using the user device 121 may capture an image of the user 120 usinga camera. The image of the user may be ingested by the user device 121using the input/output circuit 128. Similarly, a user device 121 mayinteract with the light sensors on the user device such that the lightsensors can collect data to determine whether the user device 121 isfacing light. Further, a user 120 may interact with the accelerometersuch that the accelerometer may interpret measurement data to determinewhether the user 120 is shaking the user device 121, and/or may providefeedback regarding the orientation of the device and whether the user120 is modifying the orientation of the user device 121. Feedbackassociated with the captured image may be output to the user using theinput/output circuit 128. For example, the image capture application 125may provide audible feedback to the user using speakers on the userdevice 121. Additionally or alternatively, the user 120 may interactwith the GUI executed by the user device 121 using the user’s 120 voice,a keyboard/mouse (or other hardware), and/or a touch screen.

The API gateway 123 may be configured to facilitate the transmission,receipt, authentication, data retrieval, and/or exchange of data betweenthe user device 121, and/or server 110.

Generally, an API is a software-to-software interface that allows afirst computing system of a first entity (e.g., the user device 121) toutilize a defined set of resources of a second (external) computingsystem of a second entity (e.g., the server 110, or a third party) to,for example, access certain data and/or perform various functions. Insuch an arrangement, the information and functionality available to thefirst computing system is defined, limited, or otherwise restricted bythe second computing system. To utilize an API of the second computingsystem, the first computing system may execute one or more APIs or APIprotocols to make an API “call” to (e.g., generate an API request thatis transmitted to) the second computing system. The API call may beaccompanied by a security or access token or other data to authenticatethe first computing system and/or a particular user 120. The API callmay also be accompanied by certain data / inputs to facilitate theutilization or implementation of the resources of the second computingsystem, such as data identifying users 120 (e.g., name, identificationnumber, biometric data), accounts, dates, functionalities, tasks, etc.

The API gateway 123 in the user device 121 provides variousfunctionality to other systems and devices (e.g., server 110) throughAPIs by accepting API calls via the API gateway 123. The API calls maybe generated via an API engine of a system or device to, for example,make a request from another system or device.

For example, the image capture application 125B at the server 110 and/ora downstream application operating on the server 110 may use the APIgateway 123B to communicate with the image capture application 125A. Thecommunication may include commands to control the image captureapplication 125A. For example, a circuit of the image captureapplication 125B (e.g., the image quality circuit 133B, the protocolsatisfaction circuit 106B and/or the feedback selection circuit 105B)may result in (or produce an output) that may start/stop a process(e.g., start or stop an image capture process), or receive automatedcommands of the image capture application 125A. Similarly, upon thedownstream application or image capture application 125B determining acertain result (e.g., a captured high quality image), the downstreamapplication and/or image capture application 125B may send a command tothe image capture application 125A via the API gateway to perform acertain operation (e.g., turn off an active camera at the user device121).

The processing circuit 122 may include at least memory 119 and aprocessor 129. The memory 119 includes one or more memory devices (e.g.,RAM, NVRAM, ROM, Flash Memory, hard disk storage) that store data and/orcomputer code for facilitating the various processes described herein.The memory 119 may be or include tangible, non-transient volatile memoryand/or non-volatile memory. The memory 119 stores at least portions ofinstructions and data for execution by the processor 129 to control theprocessing circuit 122. For example, memory 119 may serve as arepository for user 120 accounts (e.g., storing user 120 name, emailaddress, physical address, phone number, medical history), trainingdata, thresholds, weights, and the like for the machine learning models.In other arrangements, these and other functions of the memory 119 arestored in a remote database.

The processor 129 may be implemented as a general-purpose processor, anapplication specific integrated circuit (ASIC), one or more fieldprogrammable gate arrays (FPGAs), a digital signal processor (DSP), agroup of processing components, or other suitable electronic processingcomponents.

The NLP circuit 108 in the processing circuit 112 may includecomputer-executable instructions structured to determine informationextracted from an audio signal from the user 120. For example, the NLPcircuit 108 may be used to interpret user inputs when the user 120 isinteracting with the image capture application 125 orally. For instance,the user 120 may hold the user device 121 (e.g., at a particularposition in air) and speak into a microphone or other component of theinput/output circuit 128 on the user device 121. In an example, the user120 may request that the image capture application 125 repeat the userfeedback. In some configurations, the NLP circuit 108 may parse theaudio signal into audio frames containing portions of audio data. Theframes may be portions or segments of the audio signal having a fixedlength across the time series, where the length of the frames may bepre-established or dynamically determined.

The NLP circuit 108 may also transform the audio data into a differentrepresentation. For example, the NLP circuit 108 initially generates andrepresents the audio signal and frames (and optionally sub-frames)according to a time domain. The NLP circuit 108 transforms the frames(initially in the time domain) to a frequency domain or spectrogramrepresentation, representing the energy associated with the frequencycomponents of the audio signal in each of the frames, thereby generatinga transformed representation. In some implementations, the NLP circuit108 executes a Fast-Fourier Transform (FFT) operation of the frames totransform the audio data in the time domain to the frequency domain. Foreach frame (or sub-frame), the NLP circuit 108 may perform a simplescaling operation so that the frame occupies the range [-1, 1] ofmeasurable energy.

In some implementations, the NLP circuit 108 may employ a scalingfunction to accentuate aspects of the speech spectrum (e.g., spectrogramrepresentation). The speech spectrum, and in particular the voicedspeech, will decay at higher frequencies. The scaling functionbeneficially accentuates the voiced speech such that the voice speech isdifferentiated from background noise in the audio signal. The NLPcircuit 108 may perform an exponentiation operation on the arrayresulting from the FFT transformation to further distinguish the speechin the audio signal from background noise. The NLP circuit 108 mayemploy automatic speech recognition and/or natural language processingalgorithms to interpret the audio signal.

The authentication circuit 117 of the server 110 may be configured toauthenticate the user 120 by authenticating information received by theuser device 121. The authentication circuit 117 authenticates a user 120as being a valid account holder associated with the server 110 (and/orthe image capture application 125). In some embodiments, theauthentication circuit 117 may prompt the user 120 to enter user 120credentials (e.g., username, password, security questions, and biometricinformation such as fingerprints or facial recognition). Theauthentication circuit 117 may look up and match the information enteredby the user 120 to stored/retrieved user 120 information in memory 119.For example, memory 119 may contain a lookup table matching user 120authentication information (e.g., name, home address, IP address, MACaddress, phone number, biometric data, passwords, usernames) to user 120accounts and user 120 personal information (e.g., medical information).

The user device 121 and/or server 110 are configured to run a variety ofapplication programs and store associated data in a database of thememory 119. One such application executed by the user device 121 and/orserver 110 using the processing circuit 122 may be the image captureapplication 125. The image capture application 125 is structured toguide a user (e.g., user 120 using a user device 121) to capture images.The image capture application 125 may utilize and/or instruct othercircuits on the user device 121 such as components of the input/outputcircuit 128 (e.g., a display of the user device 121, a microphone on theuser device 121, a camera on the user device 121). For example,executing the image capture application 125 may result in displaying auser interface (e.g., a graphical user interface such as FIGS. 6A-6D).In some embodiments, data captured at the image capture application 125Aat the user device 121 is communicated to the image capture application125B at the server 110.

The image capture application 125 is a downloaded and installedapplication that includes program logic stored in a system memory (orother storage location) of the user device 121 that includes an imagequality circuit 133, a protocol satisfaction circuit 106, and a feedbackselection circuit 105. In this embodiment, the image quality circuit133, protocol satisfaction circuit 106, and feedback selection circuit105 are embodied as program logic (e.g., computer code, modules, etc.).The image capture application 125A is communicably coupled via thenetwork interface circuit 124A over the network 101 to the server 110and, particularly to the image capture application 125B that may supportat least certain processes and functionalities of the image captureapplication 125A. Similarly, the image capture application 125B iscommunicably coupled via the network interface circuit 124B over thenetwork 101 to the user device 121, and particularly to the imagecapture application 125A. In some embodiments, during download andinstallation, the image capture application 125A is stored by the memory119A of the user device 121 and selectively executable by the processor129A. Similarly, in some embodiments, the image capture application 125Bis stored by the memory 119B of the server 110 and selectivelyexecutable by the processor 129B. The program logic may configure theprocessor 129 (e.g., processor 129A of the user device 121 and processor129B of the server 110) to perform at least some of the functionsdiscussed herein. In some embodiments the image capture application 125is a stand-alone application that may be downloaded and installed on theuser device 121 and/or server. In other embodiments, the image captureapplication 125 may be a part of another application.

The depicted downloaded and installed configuration of the image captureapplication 125 is not meant to be limiting. According to variousembodiments, parts (e.g., modules, etc.) of the image captureapplication 125 may be locally installed on the user device 121/server110 and/or may be remotely accessible (e.g., via a browser-basedinterface) from the user device 121/server 110 (or other cloud system inassociation with the server 110). In this regard and in anotherembodiment, the image capture application 125 is a web-based applicationthat may be accessed using a browser (e.g., an Internet browser providedon the user device). In still another embodiment, the image captureapplication 125 is hard-coded into memory such as memory 119 of the userdevice 121/server 110 (i.e., not downloaded for installation). In analternate embodiment, the image capture application 125 may be embodiedas a “circuit” of the user device 121 as circuit is defined herein.

The image capture application 125 may be configured to guide the userand control the data capture process in order to capture high qualitydata. The image capture application 125 guides the user 120 such thatthe feedback provided to the user is minimized to obtain the desiredimage (e.g., an image that satisfies both an image quality thresholdassociated with image characteristics and an image quality thresholdassociated with image content). That is, the user 120 is guided tocapture high quality image data using feedback selected by the imagecapture application 125 (e.g., user feedback). The feedback selected bythe image capture application 125 minimizes the number of attempts (orduration of time) that the user 120 spends attempting to capture a highquality image, minimizes the effort required by the user 120 to capturehigh quality images, and/or improves the user 120 experience with theimage capture application 125.

For example, the image capture application 125 may request user feedbackquantifying the user experience with the image capture application 125.User feedback quantifying the user experience with the image captureapplication 125 may include a user’s rating of the image captureapplication indicating the effort the user 120 experienced, thefrustration the user 120 experienced, the satisfaction with theinstructions provided by the image capture application 125, and thelike. The image capture application 125 may determine the user 120experience associated with using the image capture application 125 bystatistically or algorithmically combining the user feedback quantifyingthe user 120 experience with the image capture application 125 andcomparing the user feedback against a preconfigured positive userexperience threshold.

The operations performed by the image capture application 125 may beexecuted at the user device 121, at the server 110, and/or using somecombination of the user device 121 and the server 110. For example, theimage capture application 125 may be executed both at the user device121 (e.g., image capture application 125A) and the server 110 (e.g.,image capture application 125B). In other implementations, the imagecapture application may be executed partially at the user device 121 andpartially at the server 110. Additionally or alternatively, the imagecapture application 125 may be executed completely in the user device121 (or server 110), and in some implementations may be run subsequentlyat the server 110 (or user device 121). In some implementations, theimage capture application 125A may run in parallel with the imagecapture application 125B.

For example, to reduce the latency associated with providing feedback tothe user 120, the image capture application 125 may be executed on theuser device 121 such that the user 120 receives feedback related toimproving the captured image in real time. That is, the time associatedwith the user waiting to receive feedback may be minimized (or reduced).In other implementations, a first image capture application may beexecuted (e.g., the image capture application 125A on the user device121) to provide simple feedback, and a second image capture applicationmay be executed (e.g., the image capture application 125B on the server110) to provide more sophisticated feedback to the user 120.

The image capture application includes an image quality circuit 133. Theimage quality circuit 133 may evaluate the quality of a captured image(or a frame of a video data stream) with respect to the characteristicsof the image. The quality of the image with respect to thecharacteristics of the image includes the visibility of the image (e.g.,lightness/darkness in the image, shadows in the image), the contrast ofthe image, the saturation of an image, the sharpness of time image,and/or the blur of the image (e.g., motion artifacts), and/or the noiseor distortion of an image, for instance.

The image quality circuit 133 may evaluate the quality of the image withrespect to the characteristics of the image using a machine learningmodel. In one example implementation, the image quality circuit 133 mayimplement a Blind/Referenceless Image Spatial Quality Evaluator(BRISQUE) model. BRISQUE models are beneficial because the quality of animage affected by an unknown distortion can be evaluated. That is, thecharacteristics of the image (e.g., blue, contrast, brightness), do notneed to be labeled/classified before the quality of the image isdetermined. Further, BRISQUE can be performed quickly (e.g., in realtime or near real time) because of its low computational complexity.

The BRISQUE model may be trained to evaluate the quality of an imageusing a dataset including clean images and distorted images (e.g.,images affected by pixel noise). The BRISQUE model generates an imagescore using support vector regression. The training images may benormalized. In some implementations, mean subtracted contrastnormalization may be employed to normalize the image. Features from thenormalized image may be extracted and transformed into a higherdimension (e.g., mapping the data to a new dimension, employing the“kernel trick” using sigmoid kernels, polynomial kernels, radial basisfunction kernels, and the like) such that the data is linearlyseparable. Support vector regression trains/optimizes a hyperplane tomodel the feature inputs images. The hyperplane may be optimized bytaking the gradient of a cost function (such as the hinge loss function)to maximize the margin of the hyperplane. Decision boundaries aredetermined (based on a tolerance) around the hyperplane.

In some implementations, the image quality circuit 133 can determine thecharacteristics of specific areas of the image. For example, the imagequality circuit 133 may evaluate the image quality for different teethin the image. In some implementations, the image quality circuit 133 maydetermine, using the image quality score of specific areas of the image,whether the specific areas of the image are overexposed (or too dark).In one embodiment, the image quality circuit 133 can be applied to thewhole or parts of an image. For example, a model can be trained todetect a region of interest (e.g. the inside of the mouth, the molarregions, the tongue, or individual teeth) and the image quality circuit133 can be applied to each specific region to generate a quality scoremap on the image. An example of the image quality circuit 133 beingapplied to one or more parts of the image is described herein withreference to FIG. 14 .

Referring to FIG. 2 , illustrated is a series of images with each imageof the series including varying characteristics of an image, accordingto an illustrative embodiment. A first image 202 illustrates that thebrightness of the image associated with the user’s 120 mouth is too dark212. A second image 204 illustrates that the brightness of the imageassociated with the user’s 120 mouth is improved from the first image202, but the brightness of the user’s 120 mouth is still too dark 214. Athird image 206 illustrates that the brightness of the user’s 120 mouth216 satisfies the image quality threshold associated with thecharacteristics of the image. For example, as shown, the user’s 120mouth is bright and there is contrast between the teeth and the tongue.

Referring back to FIG. 1 , the image capture application includes aprotocol satisfaction circuit 106. The protocol satisfaction circuit 106may evaluate the quality of a captured image (or a frame of a video datastream) with respect to the content of the image. The content of theimage may include the prevalence, visibility, distinctiveness and/orrelevance of various teeth and/or features in the image. That is, theprotocol satisfaction circuit 106 evaluates what is or is not visible(e.g., an absence or presence), the extent (e.g., a degree) of thevisibility, an angle, an orientation, and the like.

The protocol satisfaction circuit 106 may evaluate the prevalence,visibility, distinctiveness and/or relevance of features in the imageusing object detection. For example, the protocol satisfaction circuit106 may evaluate the angle, visibility, and/or orientation of a user’s120 facial features (e.g., teeth, lips, tongue, eyes, nose, mouth,chin).

The protocol satisfaction circuit 106 may employ any suitable objectdetection algorithm/model to detect the content of the image. In someembodiments, the protocol satisfaction circuit 106 may be applied to oneor more parts of the image as described herein with reference to FIG. 14. One example object detection model of the protocol satisfactioncircuit 106 that can operate in real time (or near real time) is the“you only look once” (YOLO) model. The YOLO model employs boundary boxesand class labels to identify objects in an image. The YOLO model istrained using a training dataset including classes identified intraining images. For example, an image may be labeled with particularclasses (e.g., facial features, such as chin, eyes, lips, nose, teeth)of objects detected in the image. In operation, the YOLO modelpartitions an image into a grid and determines whether each gridcontains a portion of a boundary box and a corresponding likelihood ofthe boundary box belonging to the particular class.

In one implementation, the protocol satisfaction circuit 106 may employphotogrammetry, for instance, to extract three-dimensional (3D)measurements from captured two-dimensional (2D) images. The protocolsatisfaction circuit 106 may perform photogrammetry by comparing knownmeasurements of facial measure with measurements of facial features inan image. The lengths/sizes of various facial features include toothmeasurements, lip size measurements, eye size measurements, chin sizemeasurements, and the like. Performing photogrammetry results in thedetermination of a position, orientation, size, and/or angle of a facialfeature in an image. For instance, the roll, pitch, yaw and distance ofthe user’s 120 head may be determined using photogrammetry or one ormore other algorithms.

In some configurations, the image capture application 125 may performphotogrammetry using measurements of average facial features (includingteeth, chin, lips, eyes, nose) from one or more databases (e.g., server110 memory 119B) and/or local memory 119A. In other configurations, theimage capture application 125 may retrieve particular measurements of auser (e.g., measured when the user 120 was at a medical professional’soffice) from local memory 119A and/or a database (e.g., server 110memory 119B). The protocol satisfaction circuit 106 compares the knownmeasurements of facial features with dimensions/measurements of thefacial features in the image to determine the position, orientation,size, and/or angle of the facial feature in the image.

The image capture application 125 includes a feedback selection circuit105. The feedback selection circuit 105 may determine relevant feedbackto provide to the user 120, based on the image quality (e.g., thecharacteristics of the image and the content of the image).

Feedback (e.g., operator/user instructions) is communicated to the user120 to increase the probability of a subsequent image (or frame) being ahigh quality image (e.g., satisfying image quality thresholds where theimage quality threshold includes image quality thresholds associatedwith the characteristics of the image and the image quality thresholdsassociated with the content of the image). The feedback may becommunicated to the user 120 visually (e.g., on a screen of the userdevice 121), audibly (e.g., projected from a speaker of the user device121), using haptics (e.g., vibrating the user device 121), or anycombination. In one implementation, the frequency of vibration maydecrease (or increase) when the user 120 adjusts the user device 121closer to a desired location (resulting in a higher quality image). Inother implementations, the user feedback (e.g., the feedbackcommunicated to the user) may indicate that the image is not optimaland/or is more optimal/less optimal from the previous image. In someimplementations, memory 119 may store various user preferencesassociated with the user feedback. For example, a user preference mayinclude only providing user feedback displayed on the user device 121(e.g., not providing audio user feedback). An example of a differentuser preference may include providing audio user feedback during certainhours of a day (e.g., from 8AM to 8PM) and provide haptic feedbackduring different hours of a day.

The feedback may be provided to the user based on unique user settings.For example, if the image capture application 125 determined that theuser 120 has access to hardware (e.g., object detection is used todetect hardware in the image, the user 120 responded to a prompt andindicated that the user 120 had hardware), then the feedback mayincorporate the hardware. The image capture application 125 learns toprovide feedback associated with different hardware based on a diversetraining set (e.g., receiving images with the hardware, receiving inputsexplicitly identifying hardware, and the like). Further, the feedbackmay be provided to the user 120 based on the region of the user 120,using the language of the user 120, and the like.

Referring to FIG. 3 , an agent-based feedback selection model 300 isshown, according to an illustrative embodiment. The agent-based feedbackselection model 300 may be considered a reinforcement learning model, inwhich a machine learning model uses agents to select actions to maximizerewards based on a policy network.

Agents 302 a to 302 m (hereinafter called “agents 302”) refer to alearner or trainer. The environment 304 a to 304 m (hereinafter called“environment 304”) refers to the quality of the image (e.g., the imagecharacteristics and the image content). At each time step t (e.g., ateach iteration), the agent 302 observes a state s_(t) of the environment304 and selects an action from a set of actions using a policy 344. Thepolicy 344 maps states and observations to actions. The policy 344 givesthe probability of taking a certain action when the agent 302 is in acertain state. The possible set of actions include possible userfeedback responses. Using reinforcement learning, for example, given thecurrent state of the environment 304, the agent 302 may recommend aparticular user feedback or type of user feedback. In some embodiments,if the image quality score is low (e.g., the image quality thresholdassociated with image characteristics and the image quality thresholdassociated with the image content are both not satisfied, or the imagequality threshold associated with image characteristics and/or the imagequality threshold associated with the image content satisfy a lowthreshold) then agent 302 may learn to recommend a significant userfeedback. An example of significant user feedback may be “open yourmouth very wide.” In contrast, regular user feedback (or simply “userfeedback”) may be “open your mouth.”

The solution space (e.g., possible set of actions) may be arbitrarilydefined and depend on the solution space considerations. For example,the solution space may be discretized such that the possible solutionsare fixed rather than on a continuous range. For instance, the actionspace may include such actions such as: “open your mouth”, “say cheese”,“move your tongue”, “add more light”, and the like. The action space mayalso include more complex schemes such as dual feedback instructionsand/or dual step sizes for an explore/exploit approach. For example, theaction space may include multiple feedback instructions such as, “openyour mouth wide and add more light”, “please back up and look towardsthe camera”, and the like. Additionally or alternatively, the actionspace may include such actions as “please open your mouth a littlewider”, “please reduce the intensity of the light a little bit”, “pleaseget much closer to the camera”, and the like.

In some embodiments, the solution space may represent a type of userfeedback, and the image capture application 125 may select user feedbackrandomly or sequentially from a user feedback script (e.g., a dictionaryof phrases) associated with the type of user feedback. The user feedbackscript may be stored in memory 119A of the user device 121 or may beretrieved from memory 119B of the server 110. The user feedback scriptmay be predetermined phrases and/or instructions to be executed by theimage capture application 125 when the feedback selection circuit 105selects the particular type of user feedback. The user feedback scriptmay improve the user experience by making the user feedback morerelatable and/or user friendly (e.g., heterogeneous) as opposed tohomogenous and static. Further, the user feedback script may be specificto the user 120, the user’s 120 language, the user’s dialect, the user’s120 age group, or other user preferences.

The feedback script associated with the type of user feedback may becategorized (grouped, or clustered) based on the user feedback type.Accordingly, the agent-based feedback selection model 300 selects thetype of user feedback, and the image capture application 125 may selectuser feedback communicated to the user 120 from the user feedbackscript.

Referring to FIG. 4 , illustrated is an example of types of userfeedback 402-408 and a corresponding user script 422-428 for each typeof user feedback, according to an illustrative embodiment. For example,a type of feedback selected by the feedback selection circuit 105 may bethe “add more light” user feedback type 408. Accordingly, in response tothe user feedback type selected by the feedback selection circuit 105(e.g., using the agent-based feedback selection model 300), the imagecapture application 125 selects user feedback communicated to the user120 from the user feedback script 428 associated with the user feedbacktype “add more light” 408.

For example, using the script 428 associated with the user feedback type“add more light” 408, the image capture application 125 may output,using a speaker on the user device 121, “please look towards the light!”Additionally or alternatively, the image capture application 125 mayinstruct the user device 121 to turn on a flashlight on the user device121.

Referring back to FIG. 3 , the solution space of the agent-basedfeedback selection model 300 may also be continuous rather thandiscrete. For example, the action space may include such actions as“move the phone two inches left”, “move the phone 45 degrees right”,“please get 30 centimeters close to the camera”, and the like. In theevent a continuous solution space is implemented, the agents 302 mayneed to train for longer such that the agents 302 can determine, forexample, a type of user feedback and a severity (or degree) of change toimprove the image quality.

As shown, the agent-based feedback selection model 300 may be anasynchronous advantage actor critic reinforcement learning model. Thatis, policy 344 is a global policy such that the agents 302 share acommon policy. The policy 344 is tuned based on the value of taking eachaction, where the value of selecting an action is defined as theexpected reward received when taking that action from the possible setof actions. In some configurations, the image capture application 125may update the policy 344 using agents operating in other servers (e.g.,via federated learning).

The policy 344 may be stored in a global model 332. Using a global model332 allows each agent 302 to have a more diversified training datasetand eliminates a need for synchronization of models associated with eachagent 302. In other configurations, there may be models associated witheach agent, and each agent may calculate a reward using a designatedmachine learning model.

An agent 302 may select actions based on a combination of policy 344 andan epsilon value representative of exploratory actions and exploitationactions. An exploratory action is an action unrestricted by priorknowledge. The exploratory action improves an agent’s 302 knowledgeabout an action by using the explored action in a sequence resulting ina reward calculation. For example, an exploratory action is selecting auser feedback type that may not have been selected in the past. Anexploitation action is a “greedy” action that exploits the agent’s 302current action-value estimates. For example, an exploitation action isselecting a user feedback type that has previously resulted in a highreward (e.g., selecting the user feedback type resulted in asubsequently captured high quality image).

Using epsilon-greedy action selection, for example, the agent 302balances exploratory actions and exploitation actions. The epsilon valuemay be the probability of exploration versus exploitation. The agent 302may select an epsilon value and perform an exploitation action or anexploratory action based on the value of the epsilon and one or moreexploitation and/or exploration thresholds. The agents 302 may performexploitation actions and exploration actions based on the value ofepsilon. The agents 302 may select an epsilon value and perform anexploitation action or an exploratory action based on the value of theepsilon and one or more exploitation and/or exploration thresholds. Theagent 302 may randomly select an epsilon value, select an epsilon valuefrom a predetermined distribution of epsilon values, select an epsilonvalue in response to the number of training epochs, select an epsilonvalue in response to one or more gradients, and the like. In someembodiments, as training progresses, exploitation actions may beleveraged to refine training. For example, the image capture application125 may revise the epsilon value (or epsilon selection) such that thelikelihood of the exploration action is higher or lower than thelikelihood of the exploitation action. Additionally, or alternatively,the image capture application 125 may revise the exploitation actionthreshold and/or the exploration action threshold.

In response to selecting an action (or multiple actions) according tothe epsilon value and policy 344, the environment 304 may change, andthere may be a new state S_(t+1). The agent 302 may receive feedback,indicating how the action affected the environment 304. In someconfigurations, the agent 302 determines the feedback. In otherconfigurations, the image capture application 125 may provide feedback.For example, if a subsequent image received by the image captureapplication 125 is a high quality image, then the image captureapplication 125 can determine that the action resulting in thesubsequent image was an appropriate action. That is, the image captureapplication 125 may determine a positive reward associated withselecting the action.

The agent 302 learns (e.g., reconfigures its policy 344) by takingactions and analyzing the rewards. A reward function can include, forexample, R(s_(t)), R(s_(t,)a_(t)), and R(s_(t), a_(t), s_(t+1) ). Insome configurations, the reward function may be a user recommendationgoodness function. For example, a reward function based on a userrecommendation goodness function may include various quadratic termsrepresenting considerations determined by a trained professional. Thatis, recommendations and other considerations used by a trainedprofessional may be modeled into a user recommendation goodnessfunction.

Each iteration (or after multiple iterations and/or steps) the agent 302selects a policy 344 (and an action) based on a current state s_(t), theepsilon value, and the agent 302 (or the machine learning model 332)calculates a reward. Each iteration, the agent 302 (or machine learningmodel 332) iteratively increases a summation of rewards. One goal ofreinforcement learning is to determine a policy 344 that maximizes (orminimizes) the cumulative set of rewards, determined via the rewardfunction.

The image capture application 125, for instance, weighs policy 344 basedon the rewards determined at each step (or series of steps) such thatcertain policy 344 (and actions) are encouraged and/or discouraged inresponse to the environment 304 being in a certain state. The policy 344is optimized by taking the gradient of an objective function (e.g., areward function) to maximize a cumulative sum of rewards at each step,or after a predetermined number of steps (e.g., a delayed reward).

In some embodiments, the image capture application 125 may injectparameter noise into the agent-based feedback selection model 300.Parameter noise may result in greater exploration and more successfulagent-based feedback selection model 300 by adding noise to theparameters of the policy selection.

In some embodiments, the rewards at each step may be compared (e.g., onan iterative basis) to a baseline. The baseline may be an expectedperformance (e.g., an expected user recommendation type), or an averageperformance (e.g., an average user recommendation type based onresponses of several trained professionals). For example, historic userrecommendations may be associated with images received by the imagecapture application 125. Evaluating a difference between the baselineand the reward is considered evaluating a value of advantage (oradvantage value). The value of the advantage indicates how much betterthe reward is from the baseline (e.g., instead of an indication of whichactions were rewarded and which actions were penalized).

In an example of training using agent-based feedback selection model300, various trained professionals may determine feedback that theywould provide to a user associated with various training images. Theuser feedback determined by the trained professionals may be used as thebaseline by the agents 302. The agents 302 may compare the selected userfeedback determined using the agents 302 and the policy to the baselineuser feedback to evaluate whether the action selected by the agents 302should be punished or rewarded. In some implementations, the baselineuser feedback may be assigned a score (e.g., +1), and other userfeedback types may be assigned a score (e.g., using a softmaxclassifier). The degree of the reward/punishment may be determined basedon the difference of the baseline user feedback score and the selecteduser feedback score.

The image capture application 125 may iteratively train the policy untilthe policy satisfies an accuracy threshold based on maximizing thereward. For example, the agents 302 train themselves by choosingaction(s) based on policies 344 that provide the highest cumulative setof rewards. The agents 302 of the machine learning model (e.g., theagent-based feedback selection model 300 executing in the feedbackselection circuit 105) may continue training until a predeterminedthreshold has been satisfied. For instance, the agents 302 may train themachine learning model until a predetermined number of steps (or seriesof steps called episodes, or iterations) have been reached.Additionally, or alternatively, the agents 302 may train the machinelearning model until the reward function satisfies a threshold valueand/or the advantage value is within a predetermined accuracy threshold.

As shown, the image capture application 125 trains the machine learningmodel (e.g., the agent-based feedback selection model 300 executing inthe feedback selection circuit 105) using, for example, asynchronousadvantage actor critic reinforcement learning. In other embodiments, theimage capture application 125 trains the agent-based feedback selectionmodel 300 using other reinforcement learning techniques.

The image capture application 125 utilizes various asynchronous agents302 a to 302 m associated with a corresponding environment to tune apolicy 344. The image capture application 125 may employ a GPU toinstantiate multiple learning agents 302 in parallel. Each agent 302asynchronously performs actions and calculates rewards using a globalmodel (such as a deep neural network). In some embodiments, the policy344 may be updated every step (or predetermined number of steps) basedon the cumulative rewards determined by each agent 302. Each agent 302may contribute to the policy 344 such that the total knowledge of themodel 332 increases and the policy 344 learns how to select userfeedback based on an image ingested by the image capture application125. Each time the model 332 is updated (e.g., after every step and/orpredetermined number of steps), the image capture application 125propagates new weights back to the agents 302 such that each agentshares a common policy 344.

Additionally or alternatively, the feedback selection circuit 105 mayemploy one or more lookup tables to select a user feedback response (ora type of user feedback). Lookup tables may be stored in memory 119, forexample. In some implementations, one or more results of the imagequality circuit 133 and/or the protocol satisfaction circuit 106 may mapto a user feedback response. For instance, if the image quality circuit133 determines that the image quality score satisfies a threshold (orsatisfies a range), then a user feedback response (or type of userfeedback) may be selected using the lookup table.

In an example, a BRISQUE machine learning model employed in the imagequality circuit 133 may determine that the image quality in the insideof the user’s 120 mouth is 80 (indicating a low quality image).Accordingly, the feedback selection circuit 105 may map the imagequality score (and/or the location of the image quality score, such asthe inside of the user’s 120 mouth) to select user feedback (e.g., usingthe user feedback script) associated with a type of user feedback (e.g.,“add more light”). That is, an image quality score of 80 inside theuser’s mouth may map to the type of user feedback “add more light.” In adifferent example, an image quality score of 30 inside the user’s mouth(indicating a good high quality image) may map to the type of userfeedback “add a little more light.”

In some embodiments, hardware may be used in conjunction with the imagecapture application 125. For example, object detection circuit maydetect objects in video feed and/or detect objects in captured images.The image capture application 125 may determine, based on the detectedobject, to provide feedback to the user 120 using the detected hardware.For example, a user 120 in possession of a stretching hardware mayreceive feedback from the image capture application 125 on how to betterposition the stretching hardware (e.g., place lips around the hardware,insert the hardware further into the user’s mouth, stick out the user’stongue with the hardware in the mouth).

In some implementations, the image capture application 125 may recommendthat the user use hardware to improve the quality of the image. Forexample, the image capture application 125 may recommend commonhousehold hardware (e.g., spoons, flashlights) to manipulate theenvironment of the image and/or the user’s mouth. Additionally oralternatively, the image capture application 125 may recommend moresophisticated hardware (e.g., a stretcher, such as a dental applianceconfigured to hold open the user’s upper and lower lips simultaneouslyto permit visualization of the user’s teeth and further configured tocontinue holding open the user’s upper and lower lips in a hands-freemanner after being positioned at least partially within the user’s mouthwhere the dental appliance includes a handle having two ends and a pairof flanges at each end of the handle). Additionally or alternatively,the image capture application 125 may prompt the user for informationrelated to available hardware. For example, the image captureapplication 125 may ask the user 120 whether the user 120 has access tohardware (e.g., spoons, stretchers, flashlights, etc.). The user 120 mayrespond orally such that a microphone of the user device 121 capturesthe user’s response and/or the user 120 may respond using the screen ofthe user device 121 (e.g., interacting with a button on a GUI, enteringtext into a text field).

In some implementations, the image capture application 125 may beconfigured to capture several images for a particular downstreamapplication. For example, an application of the server 110 mayeffectively generate a 3D model (or other parametric model) of a user’sdetention given multiple angles of a user’s mouth. Accordingly, theimage capture application 125 may be configured to capture three highquality images of the user’s mouth. In an example, the image captureapplication 125 may guide the user 120 to capture a high quality imageof the user’s mouth at a front-facing angle. However, the user 120 maycapture an image of the user’s mouth at a side angle.

In some implementations, the image capture application 125 may determinethat the image of the user’s mouth at the side angle is not the image ofthe user’s mouth at the front-facing angle. The image captureapplication 125 may invoke the feedback selection circuit 105 to selectfeedback to guide the user 120 to the desired high quality image (e.g.,the image at the particular side angle). In other implementations, theimage capture application 125 may determine that the image of the user’smouth at the side angle, while not the image of the user’s mouth at thefront-facing angle, is still a high quality image of the user’s mouth atthe side angle. That is, the image of the user’s mouth at the side anglemay be a high quality image with respect to the image characteristics(e.g., lighting, blur) and with respect to the image content.

If the image capture application 125 was configured to retrieve threehigh quality images of the user’s mouth (one at a front-facing angle,one at a side angle, and one at a top-down angle) then the image captureapplication may determine that the high quality image of the image ofthe user’s mouth at the side angle has already been captured and storethe image in memory 119. That is, even though the image captureapplication 125 was guiding the user 120 to capture an image of theuser’s mouth at the front angle, the image capture application 125 willrecognize that a high quality image of the user’s mouth at a side anglewas captured. Subsequently, the image capture application 125 mayproceed guiding the user 120 to capture a high quality image of theuser’s mouth at a front angle.

FIG. 5 is an interactive communication flow utilizing the image captureapplication 125, according to an illustrative embodiment. The imagecapture application 125 may ingest an image 502 received from the userdevice 121. For example, the user 120 may initialize the image captureapplication and capture a baseline image 502. Additionally oralternatively, the image 502 may be a video (e.g., a continuous streamof data).

In some implementations, the image capture application 125 may performone or more preprocessing operations 504 on image 502. For example,preprocessing operations 504 may include determining whether the image502 contains a mouth. That is, the image capture application 125 mayemploy object detection algorithms trained to identify various facialfeatures. For instance, the object detection algorithm may be trained toidentify teeth, lips, tongue, nose, a chin, ears, and the like. In someembodiments, the user 120 may capture an image 502 not including aportion of the user’s mouth (e.g., the captured image may include theuser’s ears). Accordingly, the image capture application 125 may executeinteractive feedback 514 (employing the feedback selection circuit 105)to select feedback (e.g., using agents 302 in the agent-based feedbackselection model 300) indicating that the user 120 should capture a newimage and include a portion of the user’s 120 mouth.

Additionally or alternatively, preprocessing operations 504 may includeparsing a video signal into video frames. The frames may be portions orsegments of the video signal across the time series. For example, attime t =0, the image capture application 125 may capture a staticsnapshot of the video data, at time t=2, the image capture application125 may capture a static snapshot of the video data. The time betweenframes may be pre-established or dynamically determined. The timebetween frames may be static (e.g., frames are captured every 2 seconds)or variable (e.g., a frame is captured 1 second after the previousframe, a next frame is captured 3 seconds after the previous frame, andthe like). In other embodiments, preprocessing operations 504 includenormalizing the image 502, scaling the image, and/or converting theimage into a greyscale image, among others.

In some implementations, preprocessing operations 504 may includeextracting features of the image 502. The image capture application 125may perform feature extraction by applying convolution to the image 502and generating a feature map of extracted features. Convolving the image502 with a filter (e.g., kernel) has the effect of reducing thedimensionality of the image 502.

Additionally or alternatively, the preprocessing operations 504 mayinclude performing pooling operations on the extracted feature map. Forexample, applying a max pooling layer on the feature map detects theprominent features of the feature map. Additionally or alternatively,applying an average pooling operation averages the features of thefeature map. Applying a pooling operation on the feature map has theeffect of further down sampling the feature map. In some configurations,the preprocessing operation 504 may include a flattening operation, inwhich the image capture application 125 arranges a feature map(represented as an array) into a one-dimensional vector.

In some implementations preprocessing operations 504 may includeperforming image segmentation (e.g., grouping pixels together withsimilar attributes, delineating objects in an image). For instance,particular teeth may be segmented using masks and/or edge detectionalgorithms such that the image capture application 125 may be used toevaluate the image quality of a particular tooth. For example, themachine learning architecture 506 may evaluate the image characteristicsof the portion of the image containing the tooth and/or the toothcontent of the image (e.g., whether the visibility of the toothsatisfies a threshold).

In some implementations, preprocessing operations 504 include performingpose estimation on the image 502. The image capture application mayperform pose estimation using, for instance, bottom-up pose estimationapproaches and/or top-down pose-estimation approaches. For example,preprocessing operations 504 may implement an encoder-decoderarchitecture to estimate landmarks on an image.

Referring to FIG. 6 , illustrated are a series of images 600-602 andcorresponding landmark models 610-612, according to an illustrativeembodiment. As shown, pose estimation may be performed to identifylocalized human landmarks using landmark models (or sets of landmarks)in an image or video frame. The landmark model 610 corresponding toimage 600, and landmark models 612 corresponding to image 602 indicatecoordinates, angles, and features relevant to head angles, mouth angles,jaw angles, and/or visibility of teeth in the image. For example, inlandmark model 610, landmark 616 may identify a mouth landmark, landmark618 may identify a face landmark, and landmark 614 may identify teethlandmarks. In landmark model 612, landmarks 620 may identify teethlandmarks, landmark 622 may identify mouth landmarks, and landmark 624and 626 may identify face landmarks. In some embodiments, the poseestimation algorithms may be configured to identify landmarks to a highresolution by identifying and distinguishing face landmarks. Forexample, landmark 626 may identify a chin landmark instead of simply aface landmark. In some configurations, the image capture application 125may display the marked images to a user 120.

Referring to FIG. 7 , illustrated is a landmark model 702 of a user 120,according to an illustrative embodiment. As shown, the user 120 mayobserve from the landmark model 702 that the image is a high qualityimage based on the characteristics of the image (e.g., the brightness,sharpness, contrast) and the content of the image (e.g., teeth areidentified/adequately distinguished using landmarks 704).

In the example, the teeth landmarks 704 are adequately distinguishedbecause at a prior point in time, the user capture application 125communicated user feedback instructing the user 120 to move theirtongue. The image capture application 125 may have provided thatfeedback to the user 120 by determining that the prior tongue landmarkassociated with a prior image was incorrect (e.g., the tongue landmarkindicated that the tongue was covering an area of the mouth that shouldbe identified by one or more teeth landmarks, the user’s tongue wascovering the user’s teeth). In some implementations, the image captureapplication 125 may determine that various landmarks are incorrect(e.g., in a suboptimal position) by comparing average landmark modelsassociated with high quality images to landmark models identified in acaptured image. The average landmark models may be average landmarkmodels of all users, average landmark models of similar users (e.g.,similar users based on a demographic, users of the same age, users ofthe same gender, users of the same race), or the like. In otherimplementations, the image capture application 125 may compare aspecific user landmark model (e.g., determined using a high qualityimage captured at a previous point in time such as with certain hardwareand/or with assistance from trained professionals) to landmark modelsidentified in a captured image to determine landmarks that should beidentified such that a type of user feedback may be selected.

Referring back to FIG. 5 , in some implementations, the machine learningarchitecture 506 may include several machine learning models. Forexample, as shown, the machine learning architecture 506 includes theimage quality evaluator 508, the protocol satisfaction evaluator 510,and the feedback selector 512. In other implementations, the machinelearning architecture 506 may be a single machine learning model.

In an example implementation, the machine learning architecture 506 maybe a reinforcement learning model such as an agent-based feedbackselection model 300. For example, the input to the machine learningarchitecture 506 (e.g., the reinforcement learning model) may be theimage 502, and the output of the machine learning architecture 506 maybe user feedback and/or types of user feedback (as described herein,with reference to FIG. 3 ).

Additionally or alternatively, the machine learning architecture 506 maybe a neural network. FIG. 8 is a block diagram of a simplified neuralnetwork model 800, according to an illustrative example. The neuralnetwork model 800 may include a stack of distinct layers (verticallyoriented) that transforms a variable number of inputs 809 (e.g., image502) being ingested by an input layer 813 into an output 808 at theoutput layer 819 via one or more hidden layers 823 between the inputlayer 813 and the output layer 819.

The input layer 813 includes neurons 811 (or nodes) connecting to eachof the neurons 815 in the hidden layer 823. The neurons 815 in thehidden layer 823 connect to neuron 821 in the output layer 819. Theoutput layer 819 determines output user feedback (or type of userfeedback) 808 using, for example, a softmax classifier. The output layer819 may use a softmax function (or a normalized exponential function) totransform an input of real numbers into a normalized probabilitydistribution over predicted output classes. For example, output classesmay include various user feedback types. The neural network model 800may learn to determine whether the image is a high quality image andclassify/predict a type of user feedback (as described with reference toFIG. 3 ) in response to the quality of the image. In some embodiments,the user feedback predicted by the neural network model 800 may be to donothing. That is, the image may be a high quality image.

Generally, neurons (811, 815, 821) perform particular computations andare interconnected to nodes of adjacent layers. Each of the neurons 811,815 and 821 sum the values from the adjacent nodes and apply anactivation function, allowing the neural network 800 to learn to predictuser feedback.

Each of the neurons 811, 815 and 821 are interconnected by algorithmicweights 817-1, 817-2, 817-3, 817-4, 817-5, 817-6 (collectively referredto as weights 817). Weights 817 are tuned during training to adjust thestrength of the neurons. For example, the adjustment of the strength ofthe neuron facilitates the neural network’s 800 ability to learnnon-linear relationships between the input image and a predicted output808 user feedback. The neural network model 800 optimizes thealgorithmic weights during training such that the neural network model800 learns to make (select, generate, or provide) user feedbackpredictions/recommendations that mirror those recommendations of atrained professional.

FIG. 9 is a block diagram of an example system 900 using supervisedlearning, according to an illustrative embodiment. Supervised learningis a method of training a machine learning model (e.g., neural networkmodel 800 described in FIG. 8 ). Supervised learning trains a machinelearning model using an input-output pair. An input-output pair is aninput with an associated known output (e.g., an expected output).

Machine learning model 904 may be trained on known input-output pairssuch that the machine learning model 904 can learn how to predict knownoutputs given known inputs. Once a machine learning model 904 haslearned how to predict known input-output pairs, the machine learningmodel 904 can operate on unknown inputs to predict an output.

Training inputs 902 and actual outputs 910 may be provided to themachine learning model 904. Training inputs 902 may include historicuser inputs (e.g., images captured by the image capture application,image captured by a trained professional). Actual outputs 910 mayinclude actual user feedback and/or types of user feedback. Actual userfeedback may be feedback determined by one or more trained professionalsin response to evaluating the corresponding image (e.g., thecorresponding training input 902). The inputs 902 and actual outputs 910may be received from the server 110. For example, memory 119B of theserver 110 may store input-output pairs (e.g., images and correspondingactual user feedback).

In an example, a machine learning model 904 may use the training inputs902 (e.g., images) to predict outputs 906 (e.g., a predicted userfeedback), by applying the current state of the machine learning model904 to the training inputs 902. The comparator 908 may compare thepredicted outputs 906 to the actual outputs 910 (e.g., actual userfeedback) to determine an amount of error or differences.

The error (represented by error signal 912) determined by the comparator908 may be used to adjust the weights in the machine learning model 904such that the machine learning model 904 changes (or learns) over time.The machine learning model 904 may be trained using a backpropagationalgorithm, for instance. The backpropagation algorithm operates bypropagating the error signal 912. The error signal 912 may be calculatedeach iteration (e.g., each pair of training inputs 902 and associatedactual outputs 910), batch, and/or epoch and propagated through all ofthe algorithmic weights in the machine learning model 904 such that thealgorithmic weights adapt based on the amount of error. The error isminimized using a loss function. Non-limiting examples of loss functionsmay include the square error function, the room mean square errorfunction, and/or the cross entropy error function.

The weighting coefficients of the machine learning model 904 may betuned to reduce the amount of error thereby minimizing the differencesbetween (or otherwise converging) the predicted output 906 and theactual output 910. The machine learning model 904 may be trained untilthe error determined at the comparator 908 is within a certain threshold(or a threshold number of batches, epochs, or iterations have beenreached). The trained machine learning model 904 and associatedweighting coefficients may subsequently be stored in memory 119B orother data repository (e.g., a database) such that the machine learningmodel 904 may be employed on unknown data (e.g., not training inputs902). Once trained and validated, the machine learning model 904 may beemployed during testing (or an inference phase). During testing, themachine learning model 904 may ingest unknown data to predict userfeedback.

Referring back to FIG. 5 , in some implementations, the machine learningarchitecture 506 may be trained (e.g., as a single model or as multiplemodel) using average training data. That is, image data (e.g., mouthdata) associated with multiple users. Additionally or alternatively, themachine learning architecture 506 may be trained using particulartraining data. For example, the machine learning architecture 506 may betrained according to a single user, regional/geographic users,particular user genders, user’s grouped with similar disabilities, usersof certain ages, and the like. Accordingly, the machine learningarchitecture may be user-specific.

The image quality evaluator 508 may evaluate the quality of the image502 with respect to image characteristics using the results of the imagequality circuit 133. The protocol satisfaction evaluator may evaluatethe quality of the image 502 with respect to the image content using theresults of the protocol satisfaction circuit 106.

For example, the protocol satisfaction circuit 106 may determine a sizeof the user’s 120 tooth based on a captured image 502. The protocolsatisfaction evaluator 510 may determine, based on the size of the toothin the image 502 determined from the protocol satisfaction circuit 106,whether the size of the tooth in the image satisfies a tooth sizethreshold (e.g., an image quality content threshold).

In some implementations, various image quality content thresholds mayexist for various purposes. For example, a first image quality contentthreshold regarding the size of a tooth may exist if a downstreamapplication involves diagnosing the user 120. Additionally oralternatively, a second image quality content threshold regarding thesize of the tooth may exist if a downstream application involvesgenerating a parametric model of the user’s tooth. That is, differentdownstream applications may have different thresholds of the content ofa high quality image. Accordingly, the protocol satisfaction evaluator510 may apply various image quality content thresholds to the results ofthe protocol satisfaction circuit 106. Similarly, the image qualityevaluator may apply various image characteristic content thresholds tothe results of the image quality circuit 133.

The threshold analyzer 511 may evaluate the outputs of both the protocolsatisfaction evaluator 510 and the image quality evaluator 508. In someconfigurations, if both the protocol satisfaction evaluator 510 and theimage quality evaluator 508 determine that the image is a high qualityimage (e.g., with respect to the image content and the characteristicsof the image respectively), then the downstream application 516 willreceive the image 502 (or the preprocessed image resulting from theimage preprocessing operations 504).

In other configurations, no predetermined amount of images or data maybe specified. For example, the downstream application 516 may receiveimage 502 data (or the preprocessed image resulting from the imagepreprocessing operations 504), and/or data resulting from the machinelearning architecture (e.g., image characteristics determined from theimage quality circuit 133 from the image quality evaluator 508, resultsfrom the image quality evaluator 508, image content determined from theprotocol satisfaction circuit 104 from the protocol satisfactionevaluator 510, results from the protocol satisfaction evaluator 510, andthe like). That is, one or more results from the machine learning modelsof the machine learning architecture 506 and/or results from the machinelearning architecture 506 may be provided to the downstream application516. The downstream application 516 may request data from the machinelearning architecture 506 until the machine learning architecture 506receives, for instance, a trigger (or other notification/command,indicated by communication 503) from the downstream application 516.

The downstream application 516 may also receive feedback from theinteractive feedback provider 514 (based on the results of the feedbackselection circuit 105) indicated by communication 505. The downstreamapplications 516 may also provide information associated with the imagequality (including information associated with the image characteristicsand/or information associated with the image content) to the interactivefeedback provider 514 indicated by communication 505. Accordingly, theinteractive feedback provider 514 (and specifically the feedbackselection circuit 105) may determine feedback in response to the datacommunicated by the downstream application 516. For example, thedownstream application 516 may complete one or more objectives of thedownstream application 516 (e.g., generate a 3D model (or otherparametric model) of the user’s teeth from a high quality 2D image ofthe user’s teeth). In response to the downstream application 516completing the one or more objectives, the interactive feedback provider514 may communicate to the user 120 feedback (determined using the dataof the downstream application) such as “Capture Successful!”, “GreatJob!”, “Stop Capturing”, or “Finished!” (or other phrases of thedictionary of phrases from the user feedback script).

In an illustrative example, the image capture application 125A of theuser device 121 may transmit the image 502 (or portion of the imageidentified as a high quality portion of the image) to the image captureapplication 125B of the server 110. In other embodiments, before theimage capture application 125A of the user device 121 transmits theimage 502 to the image capture application 125B of the server, the imagecapture application 125A may determine whether the image 502 satisfiesone or more additional criteria (e.g., in addition to determining thatthe image 502 is a high quality image). For example, the image captureapplication 125 may perform pose estimation on the image 502 anddetermine whether the landmarks identified using pose estimation aresuitable for the image capture application 125B of the server 110 orother downstream applications at the server 110.

In some embodiments, the machine learning architecture 506 (or the imagequality evaluator 508 and/or the protocol satisfaction evaluator 510)may be used to predict an image quality (including image characteristicsand/or image content) of a future image (or multiple futureimages/portions of images) using a historic image (or multiple historicimages/portions of images). The future image may be an image that hasnot been captured by the image capture application 125 yet. In theseembodiments, the image capture application 125 may anticipate a movementof the user 120 using the predicted result(s) of the machine learningarchitecture 506 (or the image quality evaluator 508 and/or the protocolsatisfaction evaluator 510). The anticipated movement of the user 120may be fed to a downstream application.

In other embodiments, other methods may be used to estimate imagequality (including image characteristics and/or image content) usinghistoric images. For example, the machine learning architecture 506 mayinclude a different machine learning model such as a convolutionalneural network, such as a Mesh R-CNN, specifically trained to predict animage content quality and/or an image characteristic quality (or acombination of an image content quality and/or image characteristicquality) using image qualities and/or image content determined fromhistoric images (e.g., by the machine learning architecture 506, theimage quality evaluator 508 and/or the protocol satisfaction evaluator510).

In an illustrative example, if a user 120 moves the user device 121towards a light, a next image (e.g., a future image) may be brighterthan the previous image. The image capture application may detect thetrend toward brighter lighting and may anticipate that future image(s),which have not been captured yet, will be brighter than the currentlycaptured image (or other historic images).

Downstream applications may include applications that incorporatecontrol systems (e.g., using a proportional integral derivative (PID))controllers. A PID controller may be a controller that uses a closedloop feedback mechanism to control variables relating to the imagecapture process. For example, the PID controller may be used to controlan input/output circuit 128 (e.g., a generate instructions to move orautofocus a camera at the user device 121).

Downstream applications of the server 110, such as downstreamapplication 516 in FIG. 5 , (or a downstream application executing onone or more other servers) may be configured to generatethree-dimensional (3D) models/reconstructions of the image (or highquality portions of the image). Generating 3D models from 2D images isdescribed in more detail in U.S. Patent Application No. 16/696,468, nowU.S. Pat. No. 10,916,053, titled “SYSTEMS AND METHODS FOR CONSTRUCTING ATHREE-DIMENSIONAL MODEL FROM TWO-DIMENSIONAL IMAGES” filed on Nov. 26,2019, and U.S. Pat. Application No. 17/247,055 titled “SYSTEMS ANDMETHOD FOR CONSTRUCTING A THREE-DIMENSIONAL MODEL FROM TWO DIMENSIONALIMAGES” filed on Nov. 25, 2020, where the contents of these applicationsare incorporated herein by reference in their entirety. Downstreamapplications of the server 110 may also be configured to generateparametric models of the image (or high quality portions of the image).

In some embodiments, the downstream application of the server generatesa treatment plan (e.g., a series of steps used to correct or otherwisemodify the positions of the user’s teeth from an initial position to afinal position or other intermediary positions) using the portions ofimages that are determined to be high quality portions. The downstreamapplication 516 may determine a parametric model generated from theportions of the images that are determined to be high quality. Forexample, the downstream application 516 generating the treatment planmay enable manipulation of individual teeth parametric model(s)determined using one or more portions of high quality images. Themanipulations may be performed manually (e.g., based on a user inputreceived via the downstream application 516), automatically (e.g., bysnapping/moving the teeth parametric model(s) to a default dental arch),or some combination. In some embodiments, the manipulation of theparametric model(s) may show a final (or target) position of the teethof the patient (e.g., user 120) following treatment via dental aligners.The downstream application may be configured to automatically generate atreatment plan based on the initial position (e.g., as reflected in themodel corresponding to the portions of the captured high quality image)and the final position (e.g., following manipulation of the parametricmodel(s) and any optional adjustments).

Downstream applications of the server 110 (or other server) may also beconfigured to manufacture an aligner or other piece of hardware (e.g., aretainer). The downstream application may use a treatment plan, or oneor more steps of the treatment plan (e.g., generated from a parametricmodel as described herein or otherwise received as an input) tofabricate an aligner. In some embodiments, before the aligner isfabricated, the treatment plan may be approved by a remotedentist/orthodontist. For example, a 3D printing system (or othercasting equipment) may cast, etch, or otherwise generate physical modelsbased on the parametric models of one or more stages of the treatmentplan. A thermoforming system may thermoform a polymeric material to thephysical models, and cut, trim or otherwise remove excess polymericmaterial from the physical models to fabricate dental aligners (orretainers). The dental aligners or retainers can be fabricated using anyof the systems or processes described in U.S. Pat. Application No.16/047,694, titled “Dental Impression Kit and Methods Therefor,” filedJul. 27, 2018, and U.S. Pat. Application No. 16/188,570, now U.S. Pat.No. 10,315,353, titled “Systems and Methods for Thermoforming DentalAligners,” filed Nov. 13, 2018, the contents of each of which are herebyincorporated by reference in their entirety. The retainer may functionin a manner similar to the dental aligners but to maintain (rather thanmove) a position of the patient’s teeth. In some embodiments, the user120 may be triggered (e.g., by a notification) to execute the imagecapture application such that high quality images (or portions ofimages) may be captured by the user 120 after the user’s teeth havereached a final position.

Downstream applications of the server 110 (or other server) may also beconfigured to monitor a dental condition of the user 120. The downstreamapplication may be configured to trigger the image capture application125 to prompt the user 120 to capture high quality images (or portionsof images) of the user’s teeth at intervals (e.g., annual checks,monthly checks, weekly checks). The downstream application may scan thehigh quality image for dental conditions such as cavities and/orgingivitis. For example, the downstream application may use machinelearning models or object detection models to determine whether the highquality of one or more teeth is affected by a dental condition. Thedownstream application may also determine the degree of the dentalcondition (e.g., a quantitative or qualitative indication of the degreeof gingivitis, for instance).

Downstream applications may also monitor a position of one or more teethof the user 120 by comparing an expected teeth position (e.g., a finalposition of the treatment plan or other intermediate position of thetreatment plan) to a current position of one or more teeth. Thedownstream application may monitor the user’s teeth to determine whetherthe user’s treatment is progressing as expected. The downstreamapplication may be configured to trigger the image capture application125 to prompt the user 120 to capture high quality images (or portionsof images) of the user’s teeth to determine a current position of theuser’s teeth (e.g., using a current high quality image of the usersteeth to generate a current parametric model of the user’s teeth).

In some embodiments, downstream applications executed on the server 110may be applications that may be performed offline or may be associatedwith high latency (e.g., the user 120 may wait several minutes, hours,days, or weeks before receiving results from the downstreamapplication).

If either the protocol satisfaction evaluator 510 or the image qualityevaluator 508 determine that the image is not a high quality image, thenthe interactive feedback provider 514 may provide feedback to the user120 (e.g., based on the results of the feedback selection circuit 105).The interactive feedback provider 514 may provide a closed feedback loopto the user 120 such that a new image 502 is captured after the user 120receives feedback (and responds to the feedback) from the interactivefeedback provider 514. Each of the images 502 received by the machinelearning architecture 506 are independent. The interactive feedbackprovider 514 is configured to provide unique feedback for each image,where each image is captured and analyzed independently of other images.Further, each image may contain a unique set of features.

In response to receiving feedback from the interactive feedback provider514, the subsequent image 502 received by the machine learningarchitecture 506 may be improved (e.g., a higher quality image withrespect to at least one of the image characteristics of the image or theimage content).

Referring to FIG. 10 , illustrated is the interactive communicationresulting from the implementation of the machine learning architectureof FIG. 5 , according to an illustrative embodiment. The image captureapplication 125 may receive an image 502. The image capture application125 ingests the image and applies the machine learning architecture 506.The quality of the image is evaluated by the image quality evaluator 508(implemented using the image quality circuit 133) to determine whetherthe characteristics of the image 502 satisfies one or more thresholds.The image quality evaluator 508 determines that the imagecharacteristics satisfy the image quality thresholds associated with theimage characteristics. The quality of the image is also evaluated by theprotocol satisfaction evaluator 510 (implemented using the protocolsatisfaction circuit 106) to determine whether the image contentsatisfies one or more thresholds. The protocol satisfaction evaluator510 determines that the image is not a high quality image based on theimage quality score not satisfying an image quality threshold associatedwith the image content. Accordingly, feedback selector 512 (implementedusing the feedback selection circuit 105) selects feedback to becommunicated to the user via interactive feedback provider 514. Asshown, feedback 1022 is both displayed and audibly announced to the user120. Feedback 1022 may communicate to the user 120 to adjust the user’slips.

The image capture application 125 receives a subsequent image 502 fromthe user 120. The subsequent image is ingested by the image captureapplication 125 and applied to the machine learning architecture 506.The quality of the image is evaluated by the image quality evaluator 508again (implemented using the image quality circuit 133) to determinewhether the image still satisfies the image quality thresholdsassociated with the image characteristics. The quality of the image isalso evaluated by the protocol satisfaction evaluator 510 again(implemented using the protocol satisfaction circuit 106) to determinewhether the image content satisfies the image quality thresholdassociated with the image content. As shown, responsive to the feedback1022, the user 120 moves their lips 1004 such that the second image 502satisfies the image quality thresholds (e.g., both the image qualitythresholds associated with the image characteristics and the imagequality thresholds associated with the image content). Indicator 1006communicates to the user 120 that the second image is more optimal thanthe first image.

FIG. 11 illustrates the interactive communication resulting from theimplementation of the machine learning architecture of FIG. 5 ,according to another illustrative embodiment. The image captureapplication 125 may receive an image 502 as shown in 1102. The imagecapture application 125 ingests the image and applies the machinelearning architecture 506. The quality of the image is evaluated by theimage quality evaluator 508 (implemented using the image quality circuit133) to determine whether the image characteristics satisfy one or morethresholds. The image quality evaluator 508 determines that the imagecharacteristics satisfy the image quality thresholds associated with theimage characteristics. The quality of the image is also evaluated by theprotocol satisfaction evaluator 510 (implemented using the protocolsatisfaction circuit 106) to determine whether the image contentsatisfies one or more thresholds. The protocol satisfaction evaluator510 determines that the image is not a high quality image based on theimage quality score not satisfying an image quality threshold associatedwith the image content. Accordingly, feedback selector 512 (implementedusing the feedback selection circuit 105) selects feedback to becommunicated to the user via interactive feedback provider 514. Asshown, feedback 1104 is both displayed and audibly announced to the user120. Feedback 1104 may communicate to the user 120 to adjust the size,distance, angle, and/or orientation of the user device 121 relative tothe user 120. Accordingly, the interactive feedback provider 514 is ableto communicate multiple instructions to the user 120 in response to asingle input image 502.

The image capture application 125 receives a continuous data stream(e.g., video data). The image capture application 125 parses the videodata into frames and analyzes the frames of the video as if the frameswere images. Frames are applied to the machine learning architecture506. The quality of the frame is evaluated by the image qualityevaluator 508 (implemented using the image quality circuit 133) todetermine whether the image characteristics satisfy the image qualitythresholds associated with the image characteristic. The quality of theframe is also evaluated by the protocol satisfaction evaluator 510(implemented using the protocol satisfaction circuit 106) to determinewhether the image content satisfies the image quality thresholdassociated with image content. As shown, responsive to the feedback1104, and based on the continuous adjustments of the user device 121,the image capture application 125 may determine that a frame of thecontinuous data stream satisfies the image quality thresholds (e.g.,both the image quality thresholds associated with the imagecharacteristics and the image quality thresholds associated with theimage content). Indicator 1106 communicates to the user 120 that a highquality image has been captured. In some implementations, the imagecapture application 125 displays the captured high quality image to theuser 120.

FIG. 12 is an illustration of the interactive communication resultingfrom the implementation of the machine learning architecture of FIG. 5 ,according to another illustrative embodiment. The image captureapplication 125 receives a continuous data stream (e.g., video data).The image capture application 125 parses the video data into frames andanalyzes the frames of the video as if the frames were images. Framesare applied to the machine learning architecture 506. The quality of theframe (image) is evaluated by the image quality evaluator 508(implemented using the image quality circuit 133) to determine whetherthe image characteristics satisfy the image quality thresholdsassociated with the image characteristics. The image quality evaluator508 determines that the image characteristics satisfy the image qualitythresholds associated with the image characteristics. The quality of theimage is also evaluated using the protocol satisfaction evaluator 510(implemented using the protocol satisfaction circuit 106) to determinewhether the image content satisfies the image quality thresholdassociated with the image content. The protocol satisfaction evaluator510 determines that the image is not a high quality frame based on theimage quality score not satisfying an image quality threshold associatedwith the image content. Accordingly, feedback selector 512 (implementedusing the feedback selection circuit 105) selects feedback to becommunicated to the user via interactive feedback provider 514. Asshown, feedback 1202 is displayed to the user 120.

In one embodiment, as shown in image 1204, the user 120 responds to thefeedback 1202 by opening the user’s mouth more, shifting the position ofthe mouth, adjusting the angle of the mouth, and moving the user device121 farther away. Continuous streams of data are analyzed by the imagecapture application 125 resulting in new feedback 1206.

In another embodiment, as shown in image 1204, feedback 1202 can beprovided to the user 120 by displaying one or more objects (or symbols,colors) such as a crosshair 1209 and a target object 1210, which aredisplayed on the user interface of the user device 121. The objects maybe any of one or more colors, transparency, luminosity, and the like.For example, crosshair 1209 may be a first color and target object 1210may be a second, different color. In some embodiments, only oneobject/symbol may be displayed to the user 120 (e.g., only crosshair1209 or target object 1210). In other embodiments, both objects/symbolsare displayed to the user 120 such that the user 120 is guided to matchthe objects (e.g., overlay crosshair 1209 onto target object 1210).Continuous streams of data are analyzed by the image capture application125 resulting in adjusted/moved crosshair 1209 positions and/or targetobject 1210 positions.

The crosshairs 1209 and/or target object 1210 may prompt user 120 toadjust the size, distance, angle, and/or orientation of the user device121 relative to the user 120 in such a way that the crosshair 1209 ismoved toward the target object 1210. The crosshairs 1209 and/or targetobject 1210 may also prompt user 120 to adjust the user’s head, mouth,tongue, teeth, lips, jaw, and the like, in such a way that the crosshair1209 is moved toward the target object 1210. The target object 1210 canbe positioned on the image 1204 relative to an area or object ofinterest. As the user 120 adjusts the device 121 and/or the user’s body,the crosshair 1209 may be moved and positioned such that the adjustmentof the user device 121 and/or user 120 by the user 120 increases theimage quality score. Additionally or alternatively, the target object1210 may be moved and positioned such that the adjustment of the userdevice 121 and/or user 120 by the user 120 increases the image qualityscore. In one example, the target object 1210 may change into adifferent symbol or object (e.g., feedback 1208). The target object 1210may also change colors, intensity, luminosity, and the like. Forexample, at least one of the crosshair 1209 and target object 1210 maychange as the objects become closer to overlapping or once the objectsoverlap a threshold amount. The crosshair 1209 and the target object1210 can be overlaid onto the image 1204 using augmented realitymethods. The one or more objects (e.g., crosshair 1209 and/or targetobject 1210) can be placed once or can be repeatedly adjusted during theimage capture process.

The image capture application 125 continues to receive continuous datastreams (e.g., video data). The image capture application 125continuously parses the video data into frames and analyzes the framesof the video as images. Frames (images) are applied to the machinelearning architecture 506. The quality of image is evaluated by theimage quality evaluator 508 (implemented using the image quality circuit133) to determine whether the image characteristic satisfies the imagequality thresholds associated with the image characteristics. Thequality of the frame is also evaluated by the protocol satisfactionevaluator 510 (implemented using the protocol satisfaction circuit 106)to determine whether the image content satisfies the image qualitythreshold associated with the image content. As shown, responsive to thefeedback 1206, and based on the continuous adjustments of the user120/user device 121, the image capture application 125 determines that aframe (image) of the continuous data stream satisfies the image qualitythresholds (e.g., both the image quality thresholds associated with theimage characteristics and the image quality thresholds associated withthe image content). Indicator 1208 communicates to the user 120 that ahigh quality image has been captured. In some implementations, the imagecapture application 125 displays the captured high quality image to theuser 120.

Feedback 1202 and 1206 communicate to the user 120 to adjust the size,distance, angle, and/or orientation of the user device 121 relative tothe user 120. Accordingly, the feedback selector 512 is able tocommunicate multiple instructions to the user 120.

Referring back to FIG. 5 , in some implementations, regardless ofwhether the threshold analyzer 511 determines that image qualitythresholds are satisfied, the feedback selector 512 may be employed toselect feedback (using the feedback selection circuit 105) for the user120 based on the output of the image quality circuit 133 and/or theprotocol satisfaction circuit 106. That is, feedback may be provided tothe user before the image quality evaluator 508 and/or the protocolsatisfaction evaluator 510 determine whether image quality thresholdsassociated with the image characteristics and/or the image content aresatisfied.

The image quality evaluator 508 and protocol satisfaction evaluator 510may be machine learning models applied to the same image 502 inparallel. In some implementations, the user device 121 may apply boththe image quality evaluator 508 and protocol satisfaction evaluator 510.In other implementations, the user device 121 may apply one machinelearning model (e.g., the image quality evaluator 508) and the server110 may apply a second machine learning model (e.g., the protocolsatisfaction evaluator 510).

Additionally or alternatively, the image quality evaluator 508 andprotocol satisfaction evaluator 510 may be applied to the image inseries. For instance, the image quality evaluator 508 may evaluate thequality of the image using the image quality evaluator 508 andsubsequently evaluate the quality of the image using the protocolsatisfaction evaluator 510 (or vice-versa). FIG. 13 is an exampleoperational flow employing the machine learning models in series,according to an illustrative embodiment.

Referring now to FIG. 13 , at operation 1302, the user may perform anaction such as initialize the image capture application 125 (e.g., 125Aat the user device 121), capture an image, and/or a movement oradjustment (e.g., mouth position, tongue position, head position, mouthangle, lip position, tongue angle, head angle, and the like).

In some implementations, if the image capture application 125 isinitialized, the image capture application 125 may instruct a camera onthe user device 121 to activate upon the initialization of the imagecapture application 125. In other implementations, the image captureapplication 125 may prompt the user 120 to open the camera on the userdevice 121 upon the initialization of the image capture application 125.

In yet further implementations, if the image capture application 125A atthe user device 121 is already initialized, the image captureapplication 125 (either at the user device 121 or the server 110) maycapture an image in response to the user 120 action. For example, theuser 120 may instruct the image capture application 125A to capture animage (e.g., by clicking a button or saying a capture command).Subsequently, the image capture application 125A will capture an image.In some embodiments, a timer is communicated (e.g., visually, on thedisplay of the user device 121, or audibly) before the image captureapplication 125A instructs the camera to capture an image.

Additionally or alternatively, the image capture application 125A at theuser device 121 may automatically capture a next image (or record usinga video camera streams of data) after the user 120 has performed anaction (e.g., moved). In some implementations, a sensor may be monitoredby the image capture application 125 (either at the user device 121 orthe server 110) to determine whether the user 120 has performed anaction (e.g., moved). In other implementations, the image captureapplication 125A may wait a predetermined amount of time beforecapturing the next image. The image capture application 125 (either atthe user device 121 or the server 110) may communicate a timer (e.g.,visually, on the display of the user device 121, or audibly) before theimage capture application 125A automatically instructions the camera tocapture an image.

The image capture application 125 may receive one or more images inresponse to the activation of the camera. In some embodiments, videodata, in the form of a continuous data stream received from the camera,may be analyzed by the image capture application 125. In otherembodiments, the image capture application 125 may instruct the user 120to capture a first baseline image. For instance, the user 120 may beprompted (prompted using audio and/or text displayed on the user device)to capture an image of the user smiling.

At operation 1304, a machine learning model may be employed to determinean image quality score associated with a first criterion. For example,the quality circuit 133 may determine an image quality score withrespect to image characteristics (e.g., motion artifacts, blur,brightness, contrast, sharpness). At operation 1306, the image captureapplication 125 may determine whether the first criterion is satisfiedbased on the results of the first machine learning model (e.g., imagequality circuit 133). In some implementations, the image captureapplication 125 may determine whether a portion of the image satisfiesthe first criterion, as described with reference to FIG. 14 . If thefirst criterion is not satisfied, then relevant feedback may bedetermined at operation 1308. For example, the feedback selectioncircuit 105 may select user feedback based on the results determined bythe image quality circuit 133. If the first criterion is satisfied, thenthe flow may proceed to operation 1310.

At operation 1310, a second machine learning model may be employed todetermine an image quality score associated with a second criterion. Thesecond machine learning model can be a different machine learning modelthan the first machine learning model. For example, the protocolsatisfaction circuit 106 may determine an image quality score withrespect to the image content (e.g., whether enough teeth are showing,whether the mouth is in the right position). The second machine learningmodel can also be the same machine learning model as the first machinelearning model.

At operation 1312, the image capture application 125 may determinewhether the second criterion is satisfied based on the results of thesecond machine learning model (e.g., protocol satisfaction circuit 106).In some implementations, the image capture application 125 may determinewhether a portion of the image satisfies the second criterion, asdescribed with reference to FIG. 14 . If the second criterion is notsatisfied, then relevant feedback may be determined at operation 1316.For example, the feedback selection circuit 105 may select relevant userfeedback based on the results of the protocol satisfaction circuit 106.If the second criterion is satisfied, then the flow may proceed tooperation 1318. That is, the flow proceeds to operation 1318 when bothof the criteria have been determined to be satisfied (with respect tothe image or a portion of the image). There may be more criteria orfewer criteria than the criterion described. For example if there aretwo criteria (as shown,) then the flow proceeds to the operation 1318when both the first criterion and the second criterion have beendetermined to be satisfied (with respect to the image or a portion ofthe image) at operations 1306 and 1312 respectively. In someembodiments, before proceeding to operation 1318, the image captureapplication may re-evaluate whether the first criterion is stillsatisfied at operation 1314.

At operation 1318, the image capture application 125 may perform anaction associated with the high quality image. For example, if the datareceived by the first machine learning model was a continuous stream ofdata (e.g., a video feed), then the image capture application 125 mayselect the frame identified as the high quality image and store theframe/image in memory 119.

Additionally or alternatively, subsequent processing may be performedusing the high quality image. For example, the image capture application125 may compress the image (or otherwise transform/modify the image) orapply additional machine learning models to the image (e.g., subsequentobject detection models). The image capture application 125 may alsotransmit the high quality image to the server 110 for further processing(e.g., to execute a next machine learning model to evaluate the sameand/or different criteria, to execute a machine learning model togenerate a parametric model from 2D data, to generate a treatment planfor the user 120, and the like).

In some implementations, one or more portions of the image may satisfyboth the first and second criteria and be transmitted for furtherprocessing. That is, portions of the image that do not satisfy both thefirst and second criteria (e.g., have a low quality image score) may bediscarded. Accordingly, only selected areas that are associated withspecific image quality scores may be sent for further processing, whileother areas having a low image quality score may be discarded. FIG. 14 ,as described herein, illustrates an example process for selecting andtransmitting some areas of an image for further processing. Transmittingone or more portions of the image that satisfy both the first and secondcriteria may reduce the data size (e.g., data packets) and memory neededto perform the subsequent processing steps. For example, processingpower and other computational resources are not consumed on portions ofthe image that are identified as low quality.

In some embodiments, the frequency of the first machine learning modelreceiving input (e.g., evaluating the first criterion at operation 1304)is higher than the second machine leaning model receiving input (e.g.,evaluating the second criterion at 1310). For example, the image captureapplication 125 may generate feedback to improve the image with respectto the first criterion before attempting to improve the image withrespect to the second criterion. Accordingly, the first machine learningmodel may be performed more often than the second machine learning modelbecause the second machine learning model is executed in the event thefirst criteria is satisfied. As discussed herein, the first criterionmay be criterion associated with image characteristics (e.g., determinedusing the image quality circuit 133) and the second criterion may becriterion associated with image content (e.g., determined using theprotocol satisfaction circuit 106).

Additionally or alternatively, the first criterion may be criteriaassociated with the image quality, where the image quality includes boththe characteristics of the image and the content of the image. That is,both the image quality circuit 133 and the protocol satisfaction circuit106 may be employed by a first machine learning model (e.g., machinelearning architecture 506 in FIG. 5 ) to determine whether the imagequality satisfies a threshold.

The second criterion may be criteria associated with different machinelearning models/architectures in downstream applications (e.g.,generating a parametric model). For example, the image captureapplication 125 may transmit data to the server 110 in response todetermining that the received image is a high quality image.Subsequently, the server 110 may execute one or more downstreamapplications using one or more other machine learningmodels/architectures to evaluate the second criterion. The secondmachine learning model associated with evaluating the second criterionis employed at a frequency less than first machine learningmodel/architecture associated with evaluating the first criterion atoperation 1304.

FIG. 14 is an illustration of a process for transmitting one or moreportions of high quality images for further processing and discardingone or more portions of low quality images, resulting from theimplementation of the machine learning architecture of FIG. 5 ,according to an illustrative embodiment. The image capture application125 may receive an image 502 as shown in 1402. The image captureapplication 125 may identify (e.g., using an object detection algorithmperformed during an image preprocessing operation at 504) a mouth 1404in the image 1402. As shown, a boundary box may be placed around theidentified mouth 1404.

In some implementations, only the relevant portion of the image 1502 maybe ingested by the image capture application 125 and applied to themachine learning architecture 506. For example, only the mouth 1404 maybe processed by the machine learning architecture 506. The quality ofthe mouth 1404 is evaluated by the image quality evaluator 508(implemented using the image quality circuit 133) to determine whetherthe characteristics of the mouth 1404 satisfy one or more thresholds. Asshown, the image quality evaluator 508 determines that three portions(or parts, or regions) of the mouth 1404 (portion 1406, portion 1408,and portion 1410) satisfy the image quality threshold associated withthe image characteristics. For example, the three portions 1406, 1408and 1410 are shown to be well lit.

In some implementations, only portions 1406, 1408 and 1410 are ingestedby the protocol satisfaction evaluator 510 (implemented using theprotocol satisfaction circuit 106) to determine whether the portions1406, 1408 and 1410 satisfy one or more thresholds. In otherimplementations, the mouth 1404 may be ingested by the protocolsatisfaction evaluator 510 to determine whether the mouth 1404 satisfiesone or more thresholds. In yet other implementations, the image 1402 maybe ingested by the protocol satisfaction evaluator 510 to determinewhether the image 1402 satisfies one or more thresholds.

The protocol satisfaction evaluator 510 may determine that portions 1406and 1408 are high quality portions of the mouth 1404 based on the imagequality score satisfying an image quality threshold associated with theimage content. Additionally or alternatively, if the protocolsatisfaction evaluator 510 receives the mouth 1404 or the image 1402,the protocol satisfaction evaluator 510 may identify portions 1406 and1408 as high quality portions. By definition, other portions of themouth 1404 and/or image 1402 may not be high quality portions (includingportion 1410). In the example, the protocol satisfaction evaluator 510may determine that portion 1410 is not a high quality image because notenough teeth are visible in the image 1402.

In some implementations, because portions 1406 and 1408 satisfy both theimage quality evaluator 508 and the protocol satisfaction evaluator 510,portions 1406 and 1408 may be transmitted to a downstream application516. As shown, portion 1410 may be discarded (or not further processed).

As a result of some portions of the image 1404 (e.g., portions 1406 and1408) being determined to be high quality images and some portions ofthe image 1402 being determined to be low quality images, the feedbackselector 512 (implemented using the feedback selection circuit 105) mayselect feedback to be communicated to the user 120 via interactivefeedback provider 514. However, the feedback selected may be weighted orbiased to address (or improve) the one or more portions of the imagethat did not satisfy a high image quality threshold. For instance,because the portions 1406 and 1408 of the mouth 1404 were identified asbeing high quality portions of the image 1402 (e.g., satisfying both theimage quality threshold associated with the image characteristics andthe image quality threshold associated with the image content), then thefeedback selector 512 may select feedback associated with improving thequality of other areas of the image 1402 (e.g., portion 1410). In someimplementations, the feedback selection circuit 105 may decrease theweighting/bias for selecting feedback associated with improving someareas of the image 1402, like portions 1406 and 1408, because bothportions 1406 and 1408 have already been identified as being a highquality portion of the image. Accordingly, the high quality portion(s)of the image may be stored in memory 119. The feedback selection circuit105 may also increase the weighting/bias for selecting feedbackassociated with improving other areas of the image 1402, like portion1410, because the area of the mouth 1402 bounded by portion 1410 has notbeen captured in a high quality image. That is, the feedback selector512 may select feedback that instructions the user 120 to capture a nextimage that may improve the image quality score associated with oneportion of the image (e.g., portion 1410) at the cost of other portionsof the image (e.g., portions 1406 and 1408) based on the weighting/bias.

The embodiments described herein have been described with reference todrawings. The drawings illustrate certain details of specificembodiments that provide the systems, methods and programs describedherein. However, describing the embodiments with drawings should not beconstrued as imposing on the disclosure any limitations that may bepresent in the drawings.

It should be understood that no claim element herein is to be construedunder the provisions of 35 U.S.C. § 112(f), unless the element isexpressly recited using the phrase “means for.”

It is noted that terms such as “approximately,” “substantially,”“about,” or the like may be construed, in various embodiments, to allowfor insubstantial or otherwise acceptable deviations from specificvalues. In various embodiments, deviations of 20 percent may beconsidered insubstantial deviations, while in certain embodiments,deviations of 15 percent may be considered insubstantial deviations, andin other embodiments, deviations of 10 percent may be consideredinsubstantial deviations, and in some embodiments, deviations of 5percent may be considered insubstantial deviations. In variousembodiments, deviations may be acceptable when they achieve the intendedresults or advantages, or are otherwise consistent with the spirit ornature of the embodiments.

Example computing systems and devices may include one or more processingunits each with one or more processors, one or more memory units eachwith one or more memory devices, and one or more system buses thatcouple various components including memory units to processing units.Each memory device may include non-transient volatile storage media,non-volatile storage media, non-transitory storage media (e.g., one ormore volatile and/or non-volatile memories), etc. In some embodiments,the non-volatile media may take the form of ROM, flash memory (e.g.,flash memory such as NAND, 3D NAND, NOR, 3D NOR, etc.), EEPROM, MRAM,magnetic storage, hard discs, optical discs, etc. In other embodiments,the volatile storage media may take the form of RAM, TRAM, ZRAM, etc.Combinations of the above are also included within the scope ofmachine-readable media. In this regard, machine-executable instructionscomprise, for example, instructions and data which cause a generalpurpose computer, special purpose computer, or special purposeprocessing machines to perform a certain function or group of functions.Each respective memory device may be operable to maintain or otherwisestore information relating to the operations performed by one or moreassociated modules, units, and/or engines, including processorinstructions and related data (e.g., database components, object codecomponents, script components, etc.), in accordance with the exampleembodiments described herein.

It should be noted that although the diagrams herein may show a specificorder and composition of method steps, it is understood that the orderof these steps may differ from what is depicted. For example, two ormore steps may be performed concurrently or with partial concurrence.Also, some method steps that are performed as discrete steps may becombined, steps being performed as a combined step may be separated intodiscrete steps, the sequence of certain processes may be reversed orotherwise varied, and the nature or number of discrete processes may bealtered or varied. The order or sequence of any element or apparatus maybe varied or substituted according to alternative embodiments.Accordingly, all such modifications are intended to be included withinthe scope of the present disclosure as defined in the appended claims.Such variations will depend on the machine-readable media and hardwaresystems chosen and on designer choice. It is understood that all suchvariations are within the scope of the disclosure. Likewise, softwareand web implementations of the present disclosure may be accomplishedwith standard programming techniques with rule based logic and otherlogic to accomplish the various database searching steps, correlationsteps, comparison steps and decision steps.

The foregoing description of embodiments has been presented for purposesof illustration and description. It is not intended to be exhaustive orto limit the disclosure to the precise form disclosed, and modificationsand variations are possible in light of the above teachings or may beacquired from this disclosure. The embodiments were chosen and describedin order to explain the principals of the disclosure and its practicalapplication to enable one skilled in the art to utilize the variousembodiments and with various modifications as are suited to theparticular use contemplated. Other substitutions, modifications, changesand omissions may be made in the design, operating conditions andarrangement of the embodiments without departing from the scope of thepresent disclosure as expressed in the appended claims.

What is claimed is:
 1. A method comprising: receiving, by one or moreprocessors coupled to non-transitory memory, a first image representingat least a first portion of a mouth of a user; executing, by the one ormore processors, a first machine-learning architecture trained togenerate a set of features from the first image; determining, by the oneor more processors, based on the set of features, that the first imagesatisfies at least one criteria for executing a second machine-learningarchitecture based on the first image; and generating, by the one ormore processors based on the first image satisfying the at least onecriteria, a prompt indicating feedback for capturing a second imagerepresenting at least a second portion of the mouth of the user.
 2. Themethod of claim 1, wherein determining that the first image satisfiesthe at least one criteria comprises determining that the first imagerepresents the mouth of the user.
 3. The method of claim 1, whereindetermining that the first image satisfies the at least one criteriacomprises determining that at least one of the first image representsthe mouth of the user at a predetermined orientation or the first imagerepresents one or more predetermined teeth of the user.
 4. The method ofclaim 1, wherein determining that the first image satisfies the at leastone criteria comprises determining that a composite quality score of thefirst image satisfies a threshold.
 5. The method of claim 4, furthercomprising: executing, by the one or more processors, the firstmachine-learning architecture to generate a plurality of quality scores,each of the plurality of quality scores representing a quality of arespective region of the first image; and determining, by the one ormore processors, the composite quality score based on the plurality ofquality scores.
 6. The method of claim 1, wherein the prompt comprisesan indication for the user to capture the second image wherein thesecond image depicts the mouth of the user in a different orientation.7. The method of claim 1, wherein the prompt comprises an indication forthe user to capture the second image wherein the second image depictsadditional teeth of the user.
 8. The method of claim 1, wherein theprompt comprises an indication for the user to capture the second imagewherein the second image depicts the mouth of the user in a differentorientation.
 9. The method of claim 1, the method further comprising:storing, by the one or more processors, the first image in the memory;automatically capturing and receiving, by the one or more processors,the second image after receiving the first image; and storing, by theone or more processors, the second image in the memory.
 10. The methodof claim 1, wherein the first image comprises a plurality of imagesrepresenting at least the first portion of the mouth of the user. 11.The method of claim 1, further comprising receiving a plurality ofinitial images in serial representing at least the first portion of themouth of the user until a specific initial image satisfies the at leastone criteria, wherein the first image is the specific initial image. 12.A system comprising: one or more processors coupled to non-transitorymemory, the one or more processors configured to: receive a first imagerepresenting at least a portion of a mouth of a user; execute a firstmachine-learning architecture trained to generate a set of features fromthe first image; determine, based on the set of features, that the firstimage satisfies at least one criteria for executing a secondmachine-learning architecture based on the first image; and generate aprompt indicating feedback determined based on the first imagesatisfying the at least one criteria, the prompt indicating feedback forcapturing a second image representing at least a second portion of themouth of the user.
 13. The system of claim 12, wherein the one or moreprocessors are further configured to determine that the first imagesatisfies the at least one criteria by determining that the first imagerepresents the mouth of the user.
 14. The system of claim 12, whereinthe one or more processors are further configured to determine that thefirst image satisfies the at least one criteria by determining that atleast one of the first image represents the mouth of the user at apredetermined orientation or the first image represents one or morepredetermined teeth of the user.
 15. The system of claim 12, wherein theone or more processors are further configured to determine that thefirst image satisfies the at least one criteria by determining that acomposite quality score of the first image satisfies a threshold. 16.The system of claim 15, wherein the one or more processors are furtherconfigured to: execute the first machine-learning architecture togenerate a plurality of quality scores, each of the plurality of qualityscores representing a quality of a respective region of the first image;and determine the composite quality score based on the plurality ofquality scores.
 17. The system of claim 12, wherein the prompt comprisesan indication for the user to capture the second image wherein thesecond image depicts the mouth of the user in a different orientation.18. The system of claim 12, wherein the prompt comprises an indicationfor the user to capture the second image wherein the second imagedepicts additional teeth of the user.
 19. The system of claim 12,wherein the prompt comprises an indication for the user to capture thesecond image wherein the second image depicts the mouth of the user in adifferent orientation.
 20. The system of claim 12, wherein the one ormore processors are further configured to: store the first image in thememory; automatically capture and receive the second image afterreceiving the first image; and store the second image in the memory. 21.The method of claim 12, wherein the first image comprises a plurality ofimages representing at least the first portion of the mouth of the user.22. The method of claim 12, wherein the one or more processors arefurther configured to receive a plurality of initial images in serialrepresenting at least the first portion of the mouth of the user until aspecific initial image satisfies the at least one criteria, wherein thefirst image is the specific initial image.
 23. A non-transitory memorycontaining instruction that, when executed by one or more processors,causes the one or more processors to perform operations comprising:receiving a first image representing at least a portion of a mouth of auser; executing a first machine-learning architecture trained togenerate a set of features from the first image; determining, based onthe set of features, that the first image satisfies at least onecriteria for executing a second machine-learning architecture based onthe first image; and generating, based on the first image satisfying theat least one criteria, a prompt indicating feedback for capturing asecond image representing at least a second portion of the mouth of theuser.
 24. The non-transitory memory of claim 23, the operations furthercomprising: receiving the second image; and generating, by the secondmachine-learning architecture, a 3D model of at least a portion of adental arch of the user based on at least one of the first image or thesecond image.
 25. The non-transitory memory of claim 23, wherein thefirst image comprises a plurality of images representing at least thefirst portion of the mouth of the user.
 26. The non-transitory memory ofclaim 23, the operations further comprising receiving a plurality ofinitial images in serial representing at least the first portion of themouth of the user until a specific initial image satisfies the at leastone criteria, wherein the first image is the specific initial image.